Blasting through lattice calculations using CUDA
نویسندگان
چکیده
Modern graphics hardware is designed for highly parallel numerical tasks and provides significant cost and performance benefits. Graphics hardware vendors are now making available development tools to support general purpose high performance computing. Nvidia’s CUDA platform, in particular, offers direct access to graphics hardware through a programming language similar to C. Using the CUDA platform we have implemented a Wilson-Dirac operator which runs at an effective 68 Gflops on the Tesla C870. The recently released GeForce GTX 280 runs this same code at 92 Gflops, and we expect further improvement pending code optimization.
منابع مشابه
Lattice Simulations using OpenACC compilers
OpenACC compilers allow one to use Graphics Processing Units without having to write explicit CUDA codes. Programs can be modified incrementally using OpenMP like directives which causes the compiler to generate CUDA kernels to be run on the GPUs. In this article we look at the performance gain in lattice simulations with dynamical fermions using OpenACC compilers.
متن کاملTechnical Report WM - CS - 2010 - 03 College of William & Mary Department of Computer Science WM - CS - 2010 - 03 Implementing the Dslash Operator in OpenCL
The Dslash operator is used in Lattice Quantum Chromodymamics (LQCD) applications to implement a Wilson-Dirac sparse matrix-vector product. Typically the Dslash operation has been implemented as a parallel program. Today’s Graphics Processing Units (GPU) are designed to do highly parallel numerical calculations for 3D graphics rendering. This design works well with scientific applications such ...
متن کاملData-Parallelism and GPUs for Lattice Gas Fluid Simulations
Lattice gas cellular automata (LGCA) models provide a relatively fast means of simulating fluid flow and can give both quantitative and qualitative insights into flow patterns around complex obstacles. Symmetry requirements inherent in the Navier-Stokes equation mandate that lattice-gas approximations to the full field equations be run on triangular lattices in two dimensions and on a 3-D proje...
متن کاملA Parallel Jacobi-type Lattice Basis Reduction Algorithm
This paper describes a parallel Jacobi method for lattice basis reduction and a GPU implementation using CUDA. Our experiments have shown that the parallel implementation is more than fifty times as fast as the serial counterpart, which is twice as fast as the well-known LLL lattice reduction algorithm.
متن کاملA GPU Implementation of a Jacobi Method for Lattice Basis Reduction
This paper describes a parallel Jacobi method for lattice basis reduction and a GPU implementation using CUDA. Our experiments have shown that the parallel implementation is more than fifty times as fast as the serial counterpart, which is about twice as fast as the well-known LLL lattice reduction algorithm.
متن کامل